pglog : a flexible log analyser

Postgres Friends Drink
2021-11-24

Who am I?

Image by Anemone123 from Pixabay

Why reinvent the wheel?

  • Existing tools were frustrating
    • Can't analyse on one specific period of time
    • Statistics are not precise enough
Image by Andy Giraud from Pixabay

Why reinvent the wheel?

  • Examples:
    • Production peak analysis
    • In-depth analysis of temp file usage
Image by Andy Giraud from Pixabay

How to install it?

  • Never in production
  • Simple extension Everything you need is in the ReadMe
  • Dockerfile option
Image by Harry Strauss from Pixabay

With docker

wget https://gitlab.com/l_avrot/pglog/-/raw/master/docker/Dockerfile
docker build -t pglog_postgresql .
docker run -d --rm -P --name  pglog pglog_postgresql

How to use it?

  • Never in production
  • Postgres logs from Postgres >= 9.0
  • csv formatted logs

Logs with enough details

log_min_duration_statement=250ms
log_autovacuum_min_duration=0
log_checkpoints=on
log_connections=on
log_disconnections=on
log_lock_waits=on
log_temp_files=0
Image by TheUjulala from Pixabay

The pglog table

  • \d pglog
  • Partitionned table
  • FDW or physical table
Image by Jeong Eun Lee from Pixabay

Creating the pglog table

  • On a file
    pglog.create_logtable_from_file(file,
                    directory,
                    version
                    [, schema]
                    [, materialized]);
  • On a directory
    pglog.create_logtable_from_dir(directory,
                    version
                    [, schema]
                    [, materialized]);
Image by Jeong Eun Lee from Pixabay

Error report

select * from error_report;
 error_severity |   nb    | percentage
----------------+---------+------------
 FATAL          |       8 |       0.00
 ERROR          |     106 |       0.00
 WARNING        |       3 |       0.00
 LOG            | 6055776 |     100.00
(4 rows)
Image by Ryan McGuire from Pixabay

Detailed report for fatal messages

select * from error_analyze_fatal;
                       message                       | count |            min             |            max
-----------------------------------------------------+-------+----------------------------+----------------------------
 terminating connection due to administrator command |     7 | 2021-08-25 13:01:32.992+00 | 2021-08-27 09:12:44.884+00
 connection to client lost                           |     2 | 2021-08-25 12:20:24.044+00 | 2021-08-25 12:20:24.044+00
(2 rows) 
Image by Ryan McGuire from Pixabay

Autovacuum report

select * from autovacuum_report
order by vacuum_count desc fetch first 5 rows only;
 schema_name |  table_name   | vacuum_count | analyze_count
-------------+---------------+--------------+---------------
 pg_catalog  | pg_statistic  |           24 |             0
 pg_catalog  | pg_attribute  |           15 |            14
 pglog       | postgresqlwed |            8 |             7
 pg_catalog  | pg_class      |            7 |             7
 pg_catalog  | pg_type       |            6 |             6
(5 rows) 
Image by Ryan McGuire from Pixabay

Checkpoint stats

select * from checkpoints_stats;
§
 avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration
---------------------+------------------------+-------------------------+----------------------------
 06:01:55.491981     | 00:15:00.809           | 00:00:32.447288         | 00:00:06.6675
(1 row) 
Image by Ryan McGuire from Pixabay

More details

pglog=# select * from checkpoints_stats_by_hour;

 date_part | avg_checkpoint_freq | median_checkpoint_freq | avg_checkpoint_duration | median_checkpoint_duration
-----------+---------------------+------------------------+-------------------------+----------------------------
         6 | 03:31:43.989333     | 00:04:59.941           | 00:00:06.375            | 00:00:04.431
         7 | 22:36:05.281917     | 00:15:00.809           | 00:00:17.00225          | 00:00:03.916
         8 | 00:21:22.5615       | 00:15:01.661           | 00:00:17.9445           | 00:00:09.3765
         9 | 00:39:21.674167     | 00:30:17.2415          | 00:00:33.430833         | 00:00:11.2735
        10 | 00:16:09.9175       | 00:16:09.9175          | 00:00:06.032            | 00:00:06.032
        11 | 00:37:25.941667     | 00:12:22.328           | 00:01:40.044667         | 00:02:29.37
        12 | 01:11:27.7846       | 00:40:00.957           | 00:01:05.7842           | 00:00:16.016
        13 | 00:15:19.55725      | 00:07:56.6295          | 00:01:18.24225          | 00:01:21.253
        15 | 04:13:58.037        | 04:13:58.037           | 00:02:30.109            | 00:02:30.109
        16 | 02:58:23.038333     | 00:25:00.427           | 00:00:05.165667         | 00:00:05.012
        17 | 00:15:00.34025      | 00:07:30.346           | 00:00:04.27425          | 00:00:03.195
        19 | 01:02:31.026        | 01:02:31.026           | 00:00:06.4115           | 00:00:06.4115
        20 | 00:09:59.441        | 00:09:59.441           | 00:00:06.241            | 00:00:06.241 
Image by Ryan McGuire from Pixabay

Temp files

pglog=# select * from tempfile_report\gx

-[ RECORD 2 ]--+----------------------------------------------------------------------
min            | 8192 kB
max            | 374 GB
global_average | 57 GB
10%            | 5720 MB
20%            | 5720 MB
30%            | 5720 MB
40%            | 6000 MB
50%            | 32 GB
60%            | 33 GB
70%            | 36 GB
80%            | 142 GB
90%            | 144 GB
100%           | 374 GB 
Image by Ryan McGuire from Pixabay

Temp file queries

select * from tempfile_queries order by frequency
 fetch first 2 rows only\gx

-[ RECORD 1 ]--------------+----------------------------------------------------------------------
frequency                  | 154
query_tempfilesize_median  | 6000 MB
query_tempfilesize_average | 30 GB
total_size                 | 4655 GB
query                      | call create_mylogs (?,true); 
Image by Ryan McGuire from Pixabay

Slow queries

pglog=# select *
from top_n_slowest_queries
order by "average" desc
fetch first 1 rows only;

-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
query   | with queries as ( select split_part(message, ? ) ) select query, count ? , avg(duration) as average, percentile_disc ? within group (order by duration) as median from queries group by query order by average desc limit ?
count   | 1
average | 26384.688000000000
median  | 26384.688
Image by Ryan McGuire from Pixabay

Session stats

select * from sessions_stats ;

 min_duration | max_duration  |  avg_duration   | user_name | database_name 
--------------+---------------+-----------------+-----------+---------------
 00:00:00.033 | 00:41:05.38   | 00:05:32.636933 | postgres  | mybench
 00:02:41.311 | 156:33:45.673 | 43:09:49.903    | postgres  | pglog
 00:00:03.538 | 00:00:06.231  | 00:00:04.8845   | postgres  | postgres
 00:00:00.003 | 00:00:00.013  | 00:00:00.007333 | postgres  | template1
Image by Ryan McGuire from Pixabay

Conections report

select * from connections_report ;

 user_name | database_name |        frequency        | connection_number 
-----------+---------------+-------------------------+-------------------
 postgres  | mybench       | 00:23:57.488429         |                14
 postgres  | pglog         | 10 days 12:05:08.038667 |                 3
 postgres  | postgres      | 00:04:20.521            |                 1
 postgres  | template1     | 00:00:00.519            |                 2
Image by Ryan McGuire from Pixabay

What really matters

  • Your own system is unique
  • with its own unique problems.
  • Craft your own queries!
https://gitlab.com/l_avrot/pglog
Image by Ryan McGuire from Pixabay

Questions?

Image by Tumisu from Pixabay